Page et al. Systematic Reviews 201 3, 2:21 
http://www.systematicreviewsjournal.eom/content/2/1/21 



3 SYSTEMATIC 

| REVIEWS 



PROTOCOL Open Access 



An empirical investigation of the potential impact 
of selective inclusion of results in systematic 
reviews of interventions: study protocol 

Matthew J Page" Joanne E McKenzie, Sally E Green and Andrew B Forbes 



Abstract 

Background: Systematic reviewers may encounter a multiplicity of outcome data in the reports of randomised 
controlled trials included in the review (for example, multiple measurement instruments measuring the same 
outcome, multiple time points, and final and change from baseline values). The primary objectives of this study are 
to investigate in a cohort of systematic reviews of randomised controlled trials of interventions for rheumatoid 
arthritis, osteoarthritis, depressive disorders and anxiety disorders: (i) how often there is multiplicity of outcome data 
in trial reports; (ii) the association between selection of trial outcome data included in a meta-analysis and the 
magnitude and statistical significance of the trial result, and; (iii) the impact of the selection of outcome data on 
meta-analytic results. 

Methods/Design: Forty systematic reviews (20 Cochrane, 20 non-Cochrane) of RCTs published from January 2010 
to January 2012 and indexed in the Cochrane Database of Systematic Reviews (CDSR) or PubMed will be randomly 
sampled. The first meta-analysis of a continuous outcome within each review will be included. From each review 
protocol (where available) and published review we will extract information regarding which types of outcome 
data were eligible for inclusion in the meta-analysis (for example, measurement instruments, time points, analyses). 
From the trial reports we will extract all outcome data that are compatible with the meta-analysis outcome as it is 
defined in the review and with the outcome data eligibility criteria and hierarchies in the review protocol. The 
association between selection of trial outcome data included in a meta-analysis and the magnitude and statistical 
significance of the trial result will be investigated. We will also investigate the impact of the selected trial result on 
the magnitude of the resulting meta-analytic effect estimates. 

Discussion: The strengths of this empirical study are that our objectives and methods are pre-specified and 
transparent. The results may inform methods guidance for systematic review conduct and reporting, particularly for 
dealing with multiplicity of randomised controlled trial outcome data. 
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Background 

Systematic reviewers may encounter a multiplicity of 
outcome data in the reports of randomised controlled 
trials (RCTs) included in their reviews [1-3]. For ex- 
ample, within a single RCT report there may be data for 
the outcome depression based on multiple measurement 
scales (for example, the Hamilton rating scale for de- 
pression (HRSD) and the Beck depression inventory 
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(BDI)), at multiple time points (for example, weeks 
three, six, and nine post intervention), and analysed in 
multiple ways (for example, as final and change from 
baseline values). When there is multiplicity of outcome 
data, the selection of data to include in the review 
should be based on a clinical or methodological rationale 
(or both), and ideally specified a priori. However, in 
some cases systematic reviewers may select results based 
on the magnitude, direction of effect, or statistical 
significance [1,3,4] (henceforth referred to as selective 
inclusion). Selective inclusion is problematic as it may 
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misrepresent the available evidence, leading to selective 
inclusion bias [5,6]. 

An empirical study by Tendal et al. [3] suggested that 
multiplicity of outcome data in RCTs is common and 
the selected result to include may impact on the meta- 
analytic estimate. The authors investigated the extent of 
three sources of multiplicity - measurement instruments, 
time points, and intervention groups - in 83 RCTs in- 
cluded in 19 Cochrane reviews reporting a standardised 
mean difference (SMD) meta-analysis. In 18 (of 19) 
meta-analyses, at least one type of multiplicity was found 
in at least one included RCT. After extracting all RCT 
outcome data that were compatible with the inclusion 
criteria of the review protocol, Monte Carlo simulations 
were used to calculate all possible SMDs for each meta- 
analysis. The median difference between the smallest 
and largest meta-analytic SMD result was 0.40 (range 
0.04 to 0.91), suggesting potential for large and import- 
ant variability in meta-analytic results. The authors did 
not investigate whether there was an association be- 
tween the included result and its characteristics (for ex- 
ample, statistical significance, magnitude), or the impact 
of other types of multiplicity (for example, multiple ana- 
lyses such as intention-to-treat and per-protocol) [3] . 

Concerns about the potential for selective inclusion 
have led to initiatives to minimise its occurrence. The 
Cochrane Collaboration Methodological Expectations 
for Cochrane Intervention Reviews (MECIR) initiative 
and the Institute of Medicine Committee on Standards 
for Systematic Reviews of Comparative Effectiveness Re- 
search have recently published guidance recommending 
that systematic reviewers report detailed protocols that 
pre-specify eligible outcome measurement instruments 
and time points for inclusion in the review [7,8]. An op- 
tional field to provide information on eligible measure- 
ment instruments and time points is also available on 
the registration form of PROSPERO, an international 
online prospective register of systematic reviews launched 
in February 2011 [9,10]. Tendal et al. also recommend 
that systematic reviewers pre-specify a hierarchy of 
measurement instruments and time points when multi- 
plicity of outcome data is anticipated (for example, 
pre-specifying that HRSD data will be included in a 
meta-analysis of depression if both HRSD and BDI data 
are reported in studies) [3]. Tendal et al. suggested that 
systematic reviewers have not consistently reported 
such detailed protocols. For example, while all of the 19 
Cochrane protocols reported eligible measurement in- 
struments, none reported a hierarchy of measurement 
instruments, eight (42%) reported eligible time points 
and only one (5%) reported a hierarchy of time points 
[3]. These protocols were published prior to 2006 and 
no studies have since assessed the frequency of pre- 
specification of these and other types of outcome data 



eligibility criteria and hierarchies (for example, prefer- 
ring adjusted rather than unadjusted effect estimates). 
Furthermore, no studies have assessed whether system- 
atic review protocols affect selective inclusion of results 
in systematic reviews. 

Another initiative that may minimise potential selective 
inclusion of results is the development of standardised sets 
of outcomes (known as core outcome sets) to collect in 
clinical trials of a specific condition [11]. Establishing a 
core outcome set in RCTs can inform which outcomes 
should be included in systematic reviews [12,13]. The earli- 
est core outcome sets were developed in the 1990s, for 
rheumatoid arthritis (RA) and osteoarthritis (OA) in the 
1990s [14-18]. Through the work of the Core Outcome 
Measures in Effectiveness Trials (COMET) initiative [19], 
core outcome sets are currently being developed for a 
range of other conditions. In addition to the core outcomes 
in RA and OA studies, recommended measurement in- 
struments are also available (for example, the Health As- 
sessment Questionnaire to measure function in RA RCTs 
[20], and a hierarchy of pain measurement instruments for 
use in OA systematic reviews, where a global pain score is 
preferred over a pain on walking score if data for both in- 
struments are available in an RCT report) [21-23]. In con- 
trast, similar guidance does not exist for other conditions 
that have neither agreed core outcome sets nor core meas- 
urement instruments (for example, depressive and anxiety 
disorders) [24-26]. To date there has been no evaluation of 
whether core outcome sets affect selective inclusion of re- 
sults in systematic reviews. 

To our knowledge, no prior work has quantitatively 
assessed the evidence for potential bias in meta-analytic 
results, which can occur when reviewers selectively in- 
clude results from the set available. Quantifying this po- 
tential for bias is important as the results of meta- 
analyses are used by various stakeholders to inform clin- 
ical practice and policy decisions. The aim of this study 
is to investigate, in a cohort of systematic reviews, the 
potential impact of selective inclusion of RCT results on 
meta-analytic effects. The primary objectives of this 
study are to investigate: 1) how often there is multiplicity 
of outcome data in RCT reports (for example, arising 
from multiple measurement scales, time points, and ana- 
lyses); 2) the association between the RCT outcome data 
included in the meta-analysis and the magnitude and 
statistical significance of the RCT result, and 3) the im- 
pact of the selection of RCT outcome data on meta- 
analytic results. 

The secondary objectives are to: 1) quantify how many 
systematic review protocols report outcome data eligibil- 
ity and hierarchies, and 2) explore how potential select- 
ive inclusion of results is modified by (i) the existence of 
a systematic review protocol, and (ii) a core outcome set 
being available for the clinical condition under review. 
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Methods/design 

Overview of the study 

Forty systematic reviews (20 Cochrane, 20 non-Cochrane) 
of RCTs published from January 2010 to January 2012 and 
indexed in the Cochrane Database of Systematic Reviews 
(CDSR) or PubMed will be randomly sampled. The first 
meta-analysis of a continuous outcome within each review 
will be included. From each review protocol (where avail- 
able) and published review we will extract information re- 
garding which types of outcome data were eligible for 
inclusion in the meta-analysis (for example, measurement 
instruments, time points, analyses). From the RCT reports 
we will extract all outcome data that are compatible with 
the meta-analysis outcome as it is defined in the review 
and with the outcome data eligibility criteria and hierarch- 
ies in the review protocol. The association between selec- 
tion of RCT outcome data included in a meta-analysis and 
the magnitude and statistical significance of the RCT re- 
sult will be investigated. We will also investigate the im- 
pact of the selected trial result on the magnitude of the 
resulting meta-analytic effect estimates. 

Eligibility criteria 

A systematic review was defined using the definition by 
Moher et al: . . /the authors' stated objective was to 
summarize evidence from multiple studies, and the art- 
icle described explicit methods, regardless of the details 
provided' [27]. The eligibility criteria for inclusion of 
both Cochrane and non-Cochrane systematic reviews in- 
clude: 1) the review was published between Issue 1, 2010 
to Issue 1, 2012 in the CDSR, or between January 2010 
to January 2012 in a non-Cochrane journal; 2) the review 
is written in English (as we do not have the resources 
available to translate systematic reviews published in other 
languages); 3) references of all included RCTs are reported 
in the review; 4) the review evaluates the effects of any 
intervention for either RA, OA, depressive disorders (in- 
cluding major depressive disorder, dysthymic disorder, 
bipolar depression, seasonal affective disorder, and post- 
partum depression), or anxiety disorders (including gener- 
alized anxiety disorder, obsessive-compulsive disorder, 
panic disorder, phobic disorders, acute stress disorder, and 
post- traumatic stress disorder) [28], and 5) the review in- 
cludes at least one continuous outcome meta-analysis of 
RCTs (for example, pain, function, number of tender or 
swollen joints, depression, anxiety, quality of life), with 
reporting of i) either the summary statistics (for example, 
mean, SD) or effect estimate and precision of each RCT 
included in the meta-analysis, and ii) the meta-analytic ef- 
fect estimate and its precision. 

We have selected these clinical areas to explore 
whether the existence of a core outcome set being avail- 
able for the clinical condition of the review (namely, RA 
and OA) impacts on selective inclusion of results. We 



will focus on continuous outcomes since there is greater 
scope for multiplicity of continuous outcomes in these 
clinical areas (for example, arising from multiple meas- 
urement instruments, final versus change from baseline 
values, adjusted versus unadjusted means, sub-scale 
scores) compared with dichotomous outcomes. Both 
Cochrane and non-Cochrane reviews will be eligible re- 
gardless of whether a published protocol for the review 
is available. Unpublished protocols will be requested 
from authors. Both new and updated reviews will be eli- 
gible. For updated reviews, the protocol drafted closest 
to the latest update will be included in this study. 

The exclusion criteria are: 1) no meta-analyses of con- 
tinuous outcomes are reported in the review; 2) results 
from non-randomised studies are included in each of 
the meta-analyses of continuous outcomes, and 3) non- 
standard meta- analytical methods are used (for example, 
Bayesian, multiple-treatments, or individual patient data 
meta-analyses). 

Literature search 

We will identify systematic reviews by performing an elec- 
tronic search of the CDSR and PubMed. We will use RA 
and OA search terms recommended by The Cochrane 
Collaboration Musculoskeletal Review Group [29], and de- 
pressive and anxiety disorders search terms recommended 
by The Cochrane Collaboration Depression, Anxiety and 
Neurosis Review Group [30]. For the PubMed search 
strategy we will combine the clinical search terms with a 
search filter used to identify systematic reviews in a previ- 
ous empirical study on the epidemiology and reporting 
characteristics of systematic reviews [27]. As the CDSR 
only includes records of Cochrane reviews, we will not use 
the systematic review search filter in the CDSR search 
strategy. We will limit searches to English language publi- 
cations and date of publication from 1 January 2010 to 31 
January 2012. The search strategies for both databases are 
reported in Additional file 1. 

Selection of systematic reviews 

The citations retrieved from the CDSR and PubMed 
databases will be exported to Microsoft Excel and ran- 
domly sorted using the random number generator (cita- 
tions of Cochrane reviews retrieved in the PubMed 
search will be deleted). One investigator (MJP) will read 
down the list of randomly sorted citations and screen 
the titles and abstracts, marking them as potentially eli- 
gible or ineligible. The full text of each potentially eli- 
gible systematic review will be retrieved and assessed 
against the inclusion criteria. This process will continue 
until 10 Cochrane RA or OA reviews, 10 non-Cochrane 
RA or OA reviews, 10 Cochrane depressive or anxiety 
disorders reviews, and 10 non-Cochrane depressive or 
anxiety disorders reviews, are included. Within both 
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clinical categories (that is, RA or OA and depressive or 
anxiety disorders), we will not constrain the selection by 
the particular clinical condition (for example, we will 
not require an equal number of reviews of depression 
and anxiety). Any difficulties with determining whether 
a systematic review meets inclusion criteria will be re- 
solved by discussion with a second researcher (JEM). 

Selection of continuous outcome for investigation 

We will select from each systematic review the first 
meta-analysis of a continuous outcome that meets the 
inclusion criteria (henceforth referred to as the index 
meta-analysis). The index meta-analysis may be selected 
from the abstract, summary of findings table, or results 
section of the review, depending on where the result is 
first reported in the publication. We will not constrain 
the selection based on the outcome label of the review 
(that is, primary, secondary, or unlabelled), because we 
anticipate that in some reviews the primary outcome(s) 
may be dichotomous or the primary continuous out- 
come may not have been meta- analysed. We will not 
constrain the selection based on the domain measured 
(for example, pain, or function). Meta-analyses will be 
eligible regardless of meta-analytic effect measure (that 
is, MD or SMD), meta- analytical model (that is, fixed- 
effect or random-effects), and number of RCTs included 
(as long as at least two RCTs are included). 

Report retrieval 

We will retrieve reports of systematic reviews, review 
protocols, and RCTs using library services. Reports of 
RCTs may comprise journal articles, conference ab- 
stracts, unpublished dissertations, or regulatory agency 
or pharmaceutical company reports. For RCTs included 
in Cochrane reviews with reports written in languages 
other than English, we will request a copy of the transla- 
tion, if available, from the Cochrane Review Groups, or 
will use Google Translate. We will retrieve reports of 
RCTs included in the index meta-analysis and those 
reported by the systematic reviewers as investigating the 
same pairwise comparison but which were excluded 
from the meta-analyses (to explore whether any eligible 
outcome data may have been missed from these reports 
or potentially excluded based on the results). If more 
than one reference for an RCT was reported by the sys- 
tematic reviewers (for example, both a journal article 
and a conference abstract), we will retrieve all references 
reported. This will enable investigation of potential se- 
lective inclusion resulting from differences in results 
reported across different sources [31-33]. 

Data extraction 

One investigator (MJP) will extract data from all reviews 
and RCTs into a standardised form created in Microsoft 



Excel. This form will be pilot-tested on one review from 
each of the four categories (Cochrane RA or OA review, 
non-Cochrane RA or OA review, Cochrane depression 
or anxiety disorders review, non-Cochrane depression or 
anxiety disorders review), and refined accordingly. A 
second investigator will independently extract data from 
a random sample of 10 reviews and their included RCTs. 
If many data extraction discrepancies are identified, we 
will consider undertaking double data extraction for the 
remaining reviews. Any discrepancies between the data 
extracted will be resolved through discussion or adjudi- 
cation by a third investigator if necessary. The list of 
data we will extract from the systematic review proto- 
cols, published systematic reviews, and RCTs is reported 
in Additional file 2. A brief summary is provided below. 

Data to extract from systematic review protocols 

From the systematic review protocol (where available) 
we will extract: 1) general characteristics of the review, 
including date of publication, and participants, interven- 
tions, comparisons, and outcomes of interest to the re- 
view; 2) reported outcome data eligibility criteria (for 
example, measurement scales, time points, intervention 
groups, and/or analyses), and 3) reported outcome data 
hierarchies (for example, whether final values were pre- 
ferred over change from baseline values if both are 
reported in an RCT publication). 

Data to extract from published systematic reviews 

From the published systematic review, we will extract 
the same information as from the protocols. In addition, 
we will extract information on any other outcome data 
reported in the review that are related measures of the 
index meta-analysis outcome under the same compari- 
son. For example, if the index meta-analysis outcome is 
global pain at 4 to 6 weeks, we will record whether any 
outcome data for different pain scales at different time 
points were included in the review, either in a subse- 
quent meta-analysis or in separate tables; these add- 
itional analyses also include sensitivity analyses related 
to the index meta-analysis. For the index meta-analysis, 
we will extract the following information: 1) the meas- 
urement instrument, time point of measurement, and 
intervention and comparison group for each RCT; 2) 
summary statistics for both groups in each RCT; 3) the 
MD or SMD, measures of variability, the statistical sig- 
nificance, and direction of the effect estimate for each 
RCT and for the meta-analytic effect; 4) heterogeneity 
statistics, and 5) which outcome data were obtained 
from the trialists because it was not reported in the RCT 
publication, involved algebraic manipulation of statistics 
(for example, calculating SDs from reported 95% CIs of 
the mean), came from a report translated into English, 
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or required a method of imputation (such as imputing a 
missing SD). 

Data to extract from RCT reports 

From the RCT reports we will extract all outcome data 
that are compatible with the index meta-analysis out- 
come as it is defined in the review and with the outcome 
data eligibility criteria and hierarchies reported in the re- 
view protocol. This could include data from multiple 
measurement instruments measuring the same outcome, 
multiple time points, multiple intervention or control 
groups, final and change from baseline values, intention- 
to-treat and per-protocol analyses, adjusted and un- 
adjusted effect estimates, and other analyses. For example, 
if the index meta-analysis is an MD meta-analysis of de- 
pression scores and the systematic reviewers report in the 
protocol that only HRSD outcome data will be included in 
a meta-analysis of depression, and specify no other out- 
come data eligibility criteria, we will extract all data for 
the HRSD (for example, all time points, adjusted and un- 
adjusted effect estimates), but no data for any other de- 
pression measurement instrument reported in the RCTs. 
Alternatively, if the index meta-analysis is an SMD meta- 
analysis of pain intensity at 12 weeks, and the systematic 
reviewers have not pre-specified any outcome data eligibil- 
ity criteria or hierarchies, we will extract all pain intensity 
data (for example, based on any measurement scale, 
intention-to-treat and per-protocol analyses) from each 
RCT at 12 weeks only. For systematic reviews without a 
protocol, we will request the unpublished protocol from 
the systematic reviewers. If one does not exist or is not 
provided, we will assume that no outcome data eligibility 
criteria or hierarchies were pre-specified, and will extract 
all outcome data from the RCTs, as long as they are com- 
patible with the index meta-analysis outcome as it is 
defined in the review (as per the second example above). 
Final and change from baseline values are a special case in 
that systematic reviewers performing an SMD meta- 
analysis of different measurement instruments should in- 
clude only final values or change from baseline values, not 
a mixture [34]. For systematic reviews that only include 
final values in an SMD meta-analysis, we will not extract 
any change from baseline values from the RCTs (and vice 
versa for systematic reviews that only include change from 
baseline values in an SMD meta-analysis). If systematic re- 
viewers include a mixture of final and change from base- 
line values in an SMD meta-analysis, we will extract both 
types of values from the RCTs. 

For each type of RCT outcome data deemed eligible 
for inclusion in the meta-analysis, we will extract: 1) the 
measurement instrument, time point of measurement, 
and intervention and comparison groups; 2) sample sizes, 
measures of central tendency, and measures of variability 
per group; 3) the effect estimate (MD or SMD) and 



measures of variability, the statistical significance, and dir- 
ection of the effect estimate; 4) the baseline SD of the out- 
come per group, and 5) whether outcome data were fully 
reported in the RCT report (where fully reported is de- 
fined as reporting sufficient information to include the 
data in a meta-analysis [35]). We will use Digitizelt 1.5.8© 
software to extract outcome data presented in figure for- 
mat when the data are not available in the text of the re- 
port. We will not contact trialists for unpublished data. 

Sample size 

A study of the characteristics of meta-analyses (with at 
least two studies) contained in the January 2008 issue of 
the Cochrane Database of Systematic Reviews [36] found 
the median number of studies per meta-analysis to be 
three. Assuming three RCTs per meta-analysis, a sample 
of forty meta-analyses will provide one hundred and 
twenty RCTs. This will allow estimation of the proportion 
of RCTs with multiplicity of outcome data to within ± 9% 
of the true population percentage. This assumes a popula- 
tion proportion of 50%, a worst case scenario for the sam- 
ple size calculation. 

Analysis 

Descriptive analyses of general characteristics of systematic 
reviews 

We will use descriptive statistics to summarise the char- 
acteristics of the systematic reviews included in the study. 
These characteristics include, for example, the clinical 
condition, intervention and comparison type, number of 
primary and secondary outcomes (reported in the review 
protocol and published review), number of RCTs included 
in the review overall, and characteristics of the index 
meta-analysis outcome (outcome definition, meta-analytic 
effect measure, meta-analytical model, and number of 
included RCTs). 

Descriptive analyses of reporting of outcome data eligibility 
criteria and hierarchies in systematic review protocols and 
published reviews 

We will calculate the proportion of systematic review 
protocols and published reviews reporting at least one 
outcome data eligibility criterion and the proportion 
reporting at least one outcome data hierarchy. We will 
also separately calculate the proportion of protocols and 
reviews reporting eligibility criteria and hierarchies in re- 
lation to each of the following types of outcome data 
multiplicity: 1) multiple measurement instruments; 2) 
multiple time points; 3) multiple intervention or control 
groups; 4) final and change from baseline values; 5) sets 
of participants contributing to the analysis (for example, 
intention-to-treat, per-protocol, as-treated); 6) unadjusted 
and adjusted effect estimates; 7) period results in cross- 
over RCTs, and 8) other. Further, we will calculate the 
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proportion of systematic reviews with at least one discrep- 
ancy in outcome data eligibility criteria and hierarchies 
between the protocol and published review (where a dis- 
crepancy is defined as an addition, removal, or modifica- 
tion of an eligibility criterion or hierarchy). 

Quantifying outcome data multiplicity in RCT reports 

We will calculate the proportion of RCTs with at least 
one type of outcome data multiplicity that is compatible 
with the index meta-analysis outcome as it is defined in 
the review and with the outcome data eligibility criteria 
and hierarchies reported in the review protocol We will 
also calculate the proportion of RCTs with the following 
types of outcome data multiplicity: 1) multiple measure- 
ment instruments; 2) multiple time points; 3) multiple 
intervention or control groups; 4) final and change from 
baseline values; 5) sets of participants contributing to the 
analysis (for example, intention-to-treat, per-protocol, as- 
treated); 6) unadjusted and adjusted effect estimates; 7) 
period results in crossover RCTs, and 8) other. In addition, 
for each RCT we will quantify the number of effect esti- 
mates that were eligible for inclusion in the index meta- 
analysis, and will quantify the median (interquartile range) 
of eligible effect estimates per RCT. We will also quantify 
the number of eligible effect estimates that were not in- 
cluded in the index meta-analysis but were included in 
other meta-analyses or elsewhere in the review (for ex- 
ample, tables). 

Testing the association between selection of outcome data 
and the magnitude and statistical significance of the effect 
estimate 

When multiple effect estimates are available for inclu- 
sion in a meta-analysis, without pre-specified selection 
rules, several different methods may be acceptable (in 
terms of not introducing bias) for selecting an effect esti- 
mate from the set available. These mechanisms may in- 
clude: 1) selecting data for the most commonly reported 
instrument, time point, or analysis across RCTs; 2) ran- 
dom selection of an effect estimate; 3) selection of the 
median effect estimate, and 4) selection of the outcome 
data based on clinical criteria. The commonality of these 
selection methods is that the selection of the effect esti- 
mate is not based on choosing systematically higher or 
lower effect estimates. If across the RCTs, selection 
methods 1) to 4) are employed, we would expect that 
the distribution of selected effect estimates would be 
consistent with what we would observe under purely 
random selection, although this does not necessarily 
mean that the process used to select the effect estimates 
was indeed random selection. 

We have developed an index, which we call the Poten- 
tial Bias Index (PBI), to assess whether the estimates 
selected for inclusion in the index meta-analysis are 



systematically higher or lower than what would be 
expected by purely random selection. This index is based 
on the ordered effect estimates for each trial and the po- 
sitioning (that is, rank) of the effect estimate selected 
within that order. A rank of 1 is assigned to the smallest 
effect estimate and a rank equal to the number of effect 
estimates is assigned to the largest effect estimate. Since 
the number of effect estimates varies across trials we 
rescale the ranks of the effect estimates to reflect their 
relative positioning (in ranking units) between the 
smallest and largest effect estimates. This is obtained by 
subtracting one from the rank of the selected effect esti- 
mate and dividing by the number of effect estimates 
minus one. The smallest effect estimate in a trial then 
has a location of zero and the largest effect estimate has 
a location of 1. So for a trial with three effect estimates 
and the rank of the chosen effect estimate of 2, its loca- 
tion is (2-1)/ (3-1) = 0.5 - halfway between the lowest 
and highest rank. The Potential Bias Index (PBI) is de- 
fined as the weighted average of the locations of the se- 
lected estimates for each trial, with the weights being 
the number of effect estimates in each trial. With this 
weighting, greater priority is given to the locations of ef- 
fect estimates the larger the number of effect estimates 
there were to choose from. The expression for PBI is: 

PBI = > 7 > Hi 

where there are k trials, ^ is the number of effect esti- 
mates in trial z, and X t is the rank of the selected effect 
estimate in trial L Derivation of this index and a worked 
example is provided in Additional files 3 and 4. Only tri- 
als with more than one effect estimate are included in 
the PBI since a trial with one effect estimate provides no 
information about relative location. When the largest ef- 
fect estimate in each of the trials is selected for inclu- 
sion, the PBI will have the value 1, and conversely PBI = 
0 when the smallest effect estimate is always selected. 
Under a process consistent with random selection, the 
PBI is expected to take the value of 0.5, so, on average 
the chosen effect estimates are at the middle location. 
Similarly, a PBI of 0.75 would indicate that on average 
the effect estimates chosen were 75% of the distance 
between the smallest and largest ranks, or equivalently 
halfway between the middle and highest rank. We have 
constructed a simple statistical test based on the PBI to 
test whether the observed selection of effect estimates is 
consistent with randomness of selection (see Additional 
file 3). Confidence intervals for the PBI can be constructed 
using bootstrap methods by resampling individual trials 
[37]. We will also apply the PBI to assess possible selec- 
tion mechanisms in which the smaller P-values of the ef- 
fect estimates are chosen for inclusion. 
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Impact of selection of outcome data on meta-analytic 
results 

The PBI described above will also be used to compare 
the index meta-analytic effect estimates with all possible 
meta-analytic effects. For each meta-analysis, all possible 
meta-analytic effects will be calculated from all combi- 
nations of available RCT effect estimates. The meta- 
analysis model used to combine the estimates (either 
fixed or random effects) will be the model that was used 
in the systematic review. However, sensitivity analyses 
will be undertaken to examine whether the type of 
meta-analysis model affects the PBI. 

We will also investigate the impact of the selected 
RCT effects on the magnitude of the resulting meta- 
analytic effect estimates. For each meta-analysis, the 
difference between the index meta-analytic effect esti- 
mate and the median of all possible meta-analytic effect 
estimates will be calculated. These differences will be 
standardised (by dividing by the pooled baseline SD of 
the outcome) and meta-analysed using a random-effects 
model across reviews. The meta-analytic weights will be 
based on the standardised standard error of the median 
meta-analytic estimates, and between RCT variability es- 
timated using DerSimonian and Lairds method of mo- 
ments estimator [38]. Note that this approach ignores 
the correlation between the meta-analytic effects within 
meta-analysis, arising from correlated RCT effects. 

Subgroup analyses 

We will examine whether the existence of 1) a system- 
atic review protocol and 2) a core outcome set being 
available for the clinical condition of the review affects 
a) the specificity of outcome data eligibility criteria and 
hierarchies reported in systematic review protocols and 
published reviews; b) the proportion of RCTs with multi- 
plicity and the proportion of systematic reviews with at 
least one RCT with multiplicity; c) the PBI of the RCT 
effect estimates selected for inclusion in the index meta- 
analysis, and d) the PBI of the resulting index meta- 
analytic effect estimates. 

Sensitivity analyses 

For systematic reviews without protocols, it is not known 
whether the outcome eligibility criteria reported in the 
methods section of the review were specified prior, or sub- 
sequent to undertaking the review. Therefore, for our pri- 
mary analyses, we have chosen to include the set of RCT 
effect estimates that are compatible with the assumption 
of no pre-specified outcome data eligibility criteria. How- 
ever, through sensitivity analyses, we plan to investigate if 
the PBIs (calculated at both the RCT and meta-analysis 
level) are modified when the set of RCT effect estimates 
are restricted to those that are compatible with the 



outcome data eligibility criteria and hierarchies specified 
in the methods section of the review. 

Discussion 

To our knowledge, this is the first empirical study 
designed to investigate the association between selection 
of RCT outcome data included in a meta-analysis and 
the magnitude and statistical significance of the RCT 
result. In publishing this protocol we are following the 
lead of others who have encouraged the pre-specification 
and transparent reporting of the objectives and design of 
methodological studies [39-43]. 

There are several strengths of our study. We will use 
systematic review methods to identify eligible reviews, 
including use of explicit inclusion criteria, sensitive search 
strategies, duplicate selection of reviews, and standardised 
and pilot-tested data extraction forms. We will perform 
double data-extraction on a random sample of reviews 
and their included RCTs, and will consider performing 
this on the complete sample if the data extraction discrep- 
ancy rate is high. In addition to exploring whether there is 
evidence of selective inclusion of RCT results in system- 
atic reviews, we will examine what the potential impact of 
this is on meta-analytic estimates. 

There are also several limitations to our study. We are fo- 
cusing only on meta-analyses of continuous outcomes, and 
so will not investigate potential selective inclusion arising 
from types of multiplicity unique to dichotomous outcomes 
(for example, binary events defined in multiple ways, or 
continuous measurement instruments dichotomised using 
different cut-points). Our study is also limited to systematic 
reviews of RA, OA, depressive disorders and anxiety 
disorders. Some of the continuous outcomes likely to be in- 
cluded in our study (for example, pain, function, and quality 
of life) exist in systematic reviews of other conditions (such 
as low-back pain), but our findings may have limited gener- 
alisability to other clinical areas. However, our focus on 
continuous outcomes and these clinical areas enables us to 
examine the impact of core outcome sets on selective inclu- 
sion of results. Finally, our study will only investigate the 
existence of potential bias in meta-analytic effect estimates 
that can result from systematic reviewers' selective inclusion 
of results reported by trialists. It is possible that the effect 
estimate(s) available in an RCT publication may have been 
selectively reported by the trialists (for example, data col- 
lected using other measurement scales may have been 
omitted based on the results). Therefore, both selective 
reporting by trialists and selective inclusion by systematic 
reviewers may in combination bias the results of a meta- 
analysis [6]; however, our analysis will only examine the 
latter. 

Meta-analysis results are of interest to various stake- 
holders and are used to inform clinical practice and pol- 
icy decisions. If the results of meta-analyses are biased 



Page et al. Systematic Reviews 2013, 2:21 
http://www.systematicreviewsjournal.eom/content/2/1/21 



Page 8 of 9 



by selective inclusion of results, additional methods 
guidance for systematic review conduct and reporting 
will be necessary. Systematic review organisations have 
only recently recommended that systematic reviewers 
pre-specify eligible measurement instruments and time 
points in their protocols [7,8]. This advice may need to 
be extended to encompass other common types of out- 
come data multiplicity. 
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